Skip to main content
ChartsMaze EDL Pipeline Hero

Overview

The ChartsMaze EDL (Enterprise Data Lake) Pipeline is a comprehensive data integration system for NSE stock market data. With a single command, it produces a complete dataset of 2,775 stocks with 86 fields per stock, combining fundamental analysis, technical indicators, corporate events, and real-time market intelligence.

Quick Start

Get your first pipeline run in under 5 minutes

Installation

Set up Python environment and dependencies

Field Reference

Explore all 86 output fields and data sources

Pipeline Settings

Customize pipeline behavior and output

What You Get

The pipeline outputs a single compressed file: all_stocks_fundamental_analysis.json.gz

Output Highlights

  • 2,775 NSE stocks with complete coverage
  • 86 fields per stock across 13 categories
  • ~2-4 MB compressed (from 30+ MB raw JSON)
  • 4-8 minute runtime (excluding optional OHLCV fetch)
  • Single command execution via run_full_pipeline.py
python3 run_full_pipeline.py

Key Features

Comprehensive Fundamentals

Quarterly results, P/E ratios, ROE, ROCE, sales growth, profit margins, and 5-year trends

Advanced Technical Indicators

RSI, MACD, SMA/EMA status, pivot points, ADR, RVOL, ATH tracking, and volume analysis

Corporate Events Tracking

Dividends, bonus issues, stock splits, rights issues, and upcoming results announcements

Regulatory Filings

Company filings via hybrid LODR + Legacy endpoints with deduplication

Real-Time News Feed

AI-sentiment tagged news (50 articles per stock) from live market sources

Market Intelligence

ASM/GSM surveillance lists, circuit stocks, bulk/block deals, and price band revisions

Historical OHLCV Data

Lifetime daily candles with smart incremental updates (optional, ~30 min first run)

Post-Earnings Analytics

Returns since earnings, max gains post-results, and quarterly performance tracking

F&O Enrichment

F&O flag, lot sizes, and next expiry dates for derivatives-enabled stocks

Automated Dependency Management

18-script pipeline with strict phase ordering and error resilience

Pipeline Architecture

The pipeline operates in 6 phases with strict dependency ordering:
1

Phase 1: Core Data

Foundation layer - fetches 2,775 stocks and creates master ISIN map
  • fetch_dhan_data.pymaster_isin_map.json + dhan_data_response.json
  • fetch_fundamental_data.pyfundamental_data.json (35 MB)
2

Phase 2: Data Enrichment

Parallel fetch of 10+ data sources (company filings, announcements, indicators, news, corporate actions, surveillance lists, etc.)
3

Phase 2.5: OHLCV History (Optional)

Smart incremental download of lifetime daily candles
  • First run: ~30 minutes
  • Incremental updates: 2-5 minutes
4

Phase 3: Base Analysis

bulk_market_analyzer.py builds the master JSON structure with 60+ base fields
5

Phase 4: Enrichment

Sequential in-place modification of master JSON (order matters!)
  1. Advanced metrics (ADR, RVOL, ATH, Turnover)
  2. Earnings performance (post-results returns)
  3. F&O data (lot sizes, expiry dates)
  4. Corporate events + news feed (MUST BE LAST)
6

Phase 5: Compression

GZIP compression at level 9 (30 MB → 2-4 MB, 85-90% reduction)

Data Categories (86 Fields)

  • Symbol, Name, Listing Date
  • Basic Industry, Sector, Index
  • Quarterly results: Net Profit, EPS, Sales, OPM (Latest, Previous, 2Q, 3Q, Last Year)
  • QoQ % and YoY % for all metrics
  • Sales Growth 5 Years, EPS history
  • Market Cap, Stock Price, P/E, Forward P/E, Historical P/E 5
  • PEG, ROE, ROCE, D/E, OPM TTM
  • FII % change QoQ, DII % change QoQ
  • Free Float %, Float Shares
  • RSI (14), SMA Status (20, 50, 200), EMA Status (20, 200)
  • Technical Sentiment, Pivot Point
  • Returns: 1 Day, 1 Week, 1 Month, 3M, 6M, 1Y
  • % from 52W High/Low, % from ATH, Gap Up %, Day Range
  • RVOL (Relative Volume vs 20-day avg)
  • 200 Days EMA Volume, % from 52W High Volume
  • Daily Rupee Turnover (20/50/100 day MA)
  • 30 Days Average Rupee Volume
  • 5/14/20/30 Days MA ADR (Average Daily Range)
  • Current circuit limit (e.g., 2%, 5%, 10%, 20%)
  • Quarterly Results Date
  • Returns since Earnings %, Max Returns since Earnings %
9 event types with icons and dates:
  • ★ LTASM/STASM | 📊 Results Recently Out | 🔑 Insider Trading
  • 📦 Block Deal | # +/- Revision | ⏰ Results (DD-Mon)
  • 🎁 Bonus | ✂️ Split | 💸 Dividend | 📈 Rights
Top 5 regulatory filings with Date, Headline, PDF URL
Top 5 real-time news with Title, Sentiment, Date

Runtime Performance

Typical execution times (2024 MacBook Pro M1, 100 Mbps internet):
  • Without OHLCV: 4-6 minutes
  • With OHLCV (incremental): 6-10 minutes
  • With OHLCV (first run): 30-40 minutes
The pipeline uses rate limiting and thread pools to avoid overwhelming API endpoints. Do not modify concurrency settings without understanding the rate limit implications.

Output Files

Primary Output

all_stocks_fundamental_analysis.json.gz  # 2-4 MB compressed

Optional Outputs (if FETCH_OPTIONAL = True)

all_indices_list.json          # 194 market indices
etf_data_response.json         # 361 ETFs

Intermediate Files (auto-cleaned if CLEANUP_INTERMEDIATE = True)

master_isin_map.json
fundamental_data.json
advanced_indicator_data.json
all_company_announcements.json
company_filings/               # per-stock filing JSONs
market_news/                   # per-stock news JSONs
ohlcv_data/                    # lifetime daily candles (KEPT)

Next Steps

Run Your First Pipeline

Follow the quickstart guide to produce your first dataset

Install Dependencies

Set up Python and required packages